5 research outputs found

    Categorising the World into Local Climate Zones -- Towards Quantifying Labelling Uncertainty for Machine Learning Models

    Full text link
    Image classification is often prone to labelling uncertainty. To generate suitable training data, images are labelled according to evaluations of human experts. This can result in ambiguities, which will affect subsequent models. In this work, we aim to model the labelling uncertainty in the context of remote sensing and the classification of satellite images. We construct a multinomial mixture model given the evaluations of multiple experts. This is based on the assumption that there is no ambiguity of the image class, but apparently in the experts' opinion about it. The model parameters can be estimated by a stochastic Expectation Maximization algorithm. Analysing the estimates gives insights into sources of label uncertainty. Here, we focus on the general class ambiguity, the heterogeneity of experts, and the origin city of the images. The results are relevant for all machine learning applications where image classification is pursued and labelling is subject to humans

    Towards Label Embedding -- Measuring classification difficulty

    Full text link
    Uncertainty quantification in machine learning is a timely and vast field of research. In supervised learning, uncertainty can already occur in the very first stage of the training process, the labelling step. In particular, this is the case when not every instance can be unambiguously classified. The problem occurs for classifying instances, where classes may overlap or instances can not be clearly categorised. In other words, there is inevitable ambiguity in the annotation step and not necessarily a 'ground truth'. We look exemplary at the classification of satellite images. Each image is annotated independently by multiple labellers and classified into local climate zones (LCZs). For each instance we have multiple votes, leading to a distribution of labels rather than a single value. The main idea of this work is that we do not assume a ground truth label but embed the votes into a K-dimensional space, with K as the number of possible categories. The embedding is derived from the voting distribution in a Bayesian setup, modelled via a Dirichlet-Multinomial model. We estimate the model and posteriors using a stochastic Expectation Maximisation algorithm with Markov Chain Monte Carlo steps. While we focus on the particular example of LCZ classification, the methods developed in this paper readily extend to other situations where multiple annotators independently label texts or images. We also apply our approach to two other benchmark datasets for image classification to demonstrate this. Besides the embeddings themselves, we can investigate the resulting correlation matrices, which can be seen as generalised confusion matrices and reflect the semantic similarities of the original classes very well for all three exemplary datasets. The insights gained are valuable and can serve as general label embedding if a single ground truth per observation cannot be guaranteed

    Advances in Uncertainty-Guided Local Climate Zone Classification

    Get PDF
    Like many other research fields, remote sensing has been greatly impacted by machine and deep learning and benefits from technological and computational advances. In recent years, considerable effort has been spent on deriving not just accurate, but also reliable models which yield a sense of predictive uncertainty. In the particular framework of image classification, the reliability is e.g. validated by cross-checking the model’s confidence in its predictions against the resulting accuracy. Predictive uncertainties, on the other hand, can be for example used to determine expressive data samples. We investigate model reliability in the framework of Local Climate Zone (LCZ) classification, using the So2Sat LCZ42 [1] data set comprised of Sentinel-1 and Sentinel-2 image pairs. [1] X. X. Zhu, J. Hu, C. Qiu, Y. Shi, J. Kang, L. Mou, H. Bagheri, M. Haberle, Y. Hua, R. Huang et al., “So2sat lcz42: a benchmark data set for the classification of global local climate zones [software and data sets],” IEEE Geoscience and Remote Sensing Magazine, vol. 8, no. 3, pp. 76–89, 2020
    corecore